AITopics | data worker

Collaborating Authors

data worker

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AT axonomy of Challenges to Curating Fair Datasets Dora Zhao Stanford University Morgan Klaus Scheuerman Sony AI Pooja Chitre

Neural Information Processing SystemsNov-20-2025, 02:16:42 GMT

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (0.04)
Europe > France (0.04)
North America > United States > Virginia (0.04)
(9 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Personal > Interview (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

AT axonomy of Challenges to Curating Fair Datasets Dora Zhao Stanford University Morgan Klaus Scheuerman Sony AI Pooja Chitre

Neural Information Processing SystemsOct-10-2025, 13:37:36 GMT

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (0.04)
Europe > France (0.04)
North America > United States > Virginia (0.04)
(9 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Personal > Interview (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings

Hamna, null, Bhat, Gayatri, Mukherjee, Sourabrata, Lalani, Faisal, Hadfield, Evan, Siddarth, Divya, Bali, Kalika, Sitaram, Sunayana

arXiv.org Artificial IntelligenceSep-30-2025

Large Language Models (LLMs) are typically evaluated through general or domain-specific benchmarks testing capabilities that often lack grounding in the lived realities of end users. Critical domains such as healthcare require evaluations that extend beyond artificial or simulated tasks to reflect the everyday needs, cultural practices, and nuanced contexts of communities. We propose Samiksha, a community-driven evaluation pipeline co-created with civil-society organizations (CSOs) and community members. Our approach enables scalable, automated benchmarking through a culturally aware, community-driven pipeline in which community feedback informs what to evaluate, how the benchmark is built, and how outputs are scored. We demonstrate this approach in the health domain in India. Our analysis highlights how current multilingual LLMs address nuanced community health queries, while also offering a scalable pathway for contextually grounded and inclusive LLM evaluation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.24506

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(8 more...)

Genre:

Research Report (1.00)
Personal > Interview (0.93)
Overview (0.93)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Health & Medicine > Health Care Providers & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Secondary Stakeholders in AI: Fighting for, Brokering, and Navigating Agency

Ajmani, Leah Hope, Abdelkadir, Nuredin Ali, Chancellor, Stevie

arXiv.org Artificial IntelligenceJun-10-2025

As AI technologies become more human-facing, there have been numerous calls to adapt participatory approaches to AI development -- spurring the idea of participatory AI. However, these calls often focus only on primary stakeholders, such as end-users, and not secondary stakeholders. This paper seeks to translate the ideals of participatory AI to a broader population of secondary AI stakeholders through semi-structured interviews. We theorize that meaningful participation involves three participatory ideals: (1) informedness, (2) consent, and (3) agency. We also explore how secondary stakeholders realize these ideals by traversing a complicated problem space. Like walking up the rungs of a ladder, these ideals build on one another. We introduce three stakeholder archetypes: the reluctant data contributor, the unsupported activist, and the well-intentioned practitioner, who must navigate systemic barriers to achieving agentic AI relationships. We envision an AI future where secondary stakeholders are able to meaningfully participate with the AI systems they influence and are influenced by.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3715275.3732071

2506.07281

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.07)
(8 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.94)
Social Sector (0.93)
Law (0.93)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

FactFlow: Automatic Fact Sheet Generation and Customization from Tabular Dataset via AI Chain Design & Implementation

Vu, Minh Duc, Chen, Jieshan, Xing, Zhenchang, Lu, Qinghua, Xu, Xiwei, Fu, Qian

arXiv.org Artificial IntelligenceFeb-25-2025

With the proliferation of data across various domains, there is a critical demand for tools that enable non-experts to derive meaningful insights without deep data analysis skills. To address this need, existing automatic fact sheet generation tools offer heuristic-based solutions to extract facts and generate stories. However, they inadequately grasp the semantics of data and struggle to generate narratives that fully capture the semantics of the dataset or align the fact sheet with specific user needs. Addressing these shortcomings, this paper introduces \tool, a novel tool designed for the automatic generation and customisation of fact sheets. \tool applies the concept of collaborative AI workers to transform raw tabular dataset into comprehensive, visually compelling fact sheets. We define effective taxonomy to profile AI worker for specialised tasks. Furthermore, \tool empowers users to refine these fact sheets through intuitive natural language commands, ensuring the final outputs align closely with individual preferences and requirements. Our user evaluation with 18 participants confirms that \tool not only surpasses state-of-the-art baselines in automated fact sheet production but also provides a positive user experience during customization tasks.

fact book, fact sheet, survey article, (17 more...)

arXiv.org Artificial Intelligence

2502.17909

Country:

Asia > China (0.28)
Europe > Belgium (0.14)

Genre: Overview > Fact Book (1.00)

Industry:

Media > Film (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science (1.00)
(5 more...)

Add feedback

A Taxonomy of Challenges to Curating Fair Datasets

Zhao, Dora, Scheuerman, Morgan Klaus, Chitre, Pooja, Andrews, Jerone T. A., Panagiotidou, Georgia, Walker, Shawn, Pine, Kathleen H., Xiang, Alice

arXiv.org Artificial IntelligenceJun-10-2024

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.

data quality, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.06407

Country:

North America > United States > Arizona (0.04)
Europe > France (0.04)
North America > United States > Illinois (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

Automatic Histograms: Leveraging Language Models for Text Dataset Exploration

Reif, Emily, Qian, Crystal, Wexler, James, Kahng, Minsuk

arXiv.org Artificial IntelligenceFeb-21-2024

Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data workers often rely on dataset summaries, especially distributions of various derived features. Some features, like toxicity or topics, are relevant to many datasets, but many interesting features are domain specific: instruments and genres for a music dataset, or diseases and symptoms for a medical dataset. Accordingly, data workers often run custom analyses for each dataset, which is cumbersome and difficult. We present AutoHistograms, a visualization tool leveragingLLMs. AutoHistograms automatically identifies relevant features, visualizes them with histograms, and allows the user to interactively query the dataset for categories of entities and create new histograms. In a user study with 10 data workers (n=10), we observe that participants can quickly identify insights and explore the data using AutoHistograms, and conceptualize a broad range of applicable use cases. Together, this tool and user study contributeto the growing field of LLM-assisted sensemaking tools.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.1488

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Virginia (0.04)
(8 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.58)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.46)

Add feedback

We are all AI's free data workers

MIT Technology ReviewJun-13-2023, 08:06:04 GMT

The secret to making AI chatbots sound smart and spew less toxic nonsense is to use a technique called reinforcement learning from human feedback, which uses input from people to improve the model's answers. It relies on a small army of human data annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model's database or removed. Even the most impressive AI chatbots require thousands of human work hours to behave in a way their creators want them to, and even then they do it unreliably. The work can be brutal and upsetting, as we will hear this week when the ACM Conference on Fairness, Accountability, and Transparency (FAccT) gets underway.

artificial intelligence, machine learning, natural language, (11 more...)

MIT Technology Review

Country:

Africa > Kenya (0.09)
Africa > Ethiopia (0.06)
Africa > Eritrea (0.06)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI Collaboration in Data Storytelling

Li, Haotian, Wang, Yun, Liao, Q. Vera, Qu, Huamin

arXiv.org Artificial IntelligenceApr-17-2023

Data storytelling plays an important role in data workers' daily jobs since it boosts team collaboration and public communication. However, to make an appealing data story, data workers spend tremendous efforts on various tasks, including outlining and styling the story. Recently, a growing research trend has been exploring how to assist data storytelling with advanced artificial intelligence (AI). However, existing studies may focus on individual tasks in the workflow of data storytelling and do not reveal a complete picture of humans' preference for collaborating with AI. To better understand real-world needs, we interviewed eighteen data workers from both industry and academia to learn where and how they would like to collaborate with AI. Surprisingly, though the participants showed excitement about collaborating with AI, many of them also expressed reluctance and pointed out nuanced reasons. Based on their responses, we first characterize stages and tasks in the practical data storytelling workflows and the desired roles of AI. Then the preferred collaboration patterns in different tasks are identified. Next, we summarize the interviewees' reasons why and why not they would like to collaborate with AI. Finally, we provide suggestions for human-AI collaborative data storytelling to hopefully shed light on future related research.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.08366

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

How to Break Data Silos to Drive Enterprise-Wide AI - Splice Machine

#artificialintelligenceJun-14-2021, 07:15:08 GMT

Not many people miss having to manually sort files, label papers, or search for lost forms in huge filing cabinets. That's because all these tasks have become way easier, faster, and more enjoyable since they've become digitized – computers and the internet have revolutionized the way businesses approach organization and task management. Similar to how computers and the internet made monotonous tasks faster and easier in every department, AI will transform work in every industry in the 21st century. Machine learning will automate away the most time-consuming and repetitive tasks across a company, along with offering predictions that will allow businesses to make better decisions ahead of time. Introducing these revolutionary processes takes time and specialized knowledge.

artificial intelligence, data scientist, machine learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.36)

Add feedback